IFM File Data

Currently, the IFM data resides in a file of its own, file name ends in .ifm, example dvnc.ifm. I still hope that I may some day be able to include the IFM data into the Adobe AFM file. Hence, all the lines in the IFM file start off with the word Comment, so that regular Adobe programs that scan the AFM file may skip over this data. The marker -I- follows the comment word, and this marks the line as containing legal metric data for the indian language.

Every line in the AFM file consists of semicolon separated fields. Each line describes a composite alphabet form, such as a consonant-vowel description or a consonant-consonant form, or a vowel form, or a half form of a consonant, etc. The english words used to specify the characters of the indian language alphabet are meant to sound right, in some vague manner, and are usually self-descriptive. These english words (such as ii, aa, aha, etc) have no relation to what the user types to get the required character, that mapping is defined by the lexer input file, ilex.l.

Every field in a line in the IFM file starts off with an opcode, describing what to expect next.

The following lines are representative of the data in the AFM file:

Comment -I- StartINDIAN
Comment -I- FONT marathi dmta.afm
Comment -I- PROP 1 ligatures ;
Comment -I- CC a 2 ; PCC 97 0 0 ; PCC 129 -70 0 ;
Comment -I- CC a 2 ; PCC 97 0 0 ; PCC 129 -70 0 ;
Comment -I- CC gha-ii 3 ; PCC implicit 0 0 ; PCC 129 -70 0 ; PCC 132 0 0 ;
Comment -I- CCS gha ga ;
Comment -I- CCADD tmplA ;
Comment -I- EndINDIAN

The opcodes StartINDIAN and EndIndian are used to bracket the indian language character description data.

The FONT opcode gives the name of the font, and the name of the file which contains the AFM data for this font. If the font is a Metafont description, then the TFM file name is specified here. The TFM file name is just used as an indication that the font used is a TEX font, the file is not actually opened or read. Thus, if the font used is a Metafont description, you could as well say ``junk.tfm'' in the FONT opcode—only the extension ``.tfm'' is important.

The PROP opcode defines some property of the font. Currently, only one property is recognized: as to whether the language uses ligatures or not. The keyword to enable ligatures is ligatures, to turn off the ligature mechanism, use the keyword no_ligatures. Devanagari script uses ligatures, while tamil does not.

The CC words stands for composite character, and it defines how to construct the given character using the font. In the example above, it shows that the a character is made up of two units: char code 97, and char code 129. See the previous section regarding the PCC opcode. The second CC line above describes how to create the ii vowel form of the consonant gha. The first PCC character code in the line is not a number, but states implicit. This requires that a special character, called gha-implicit be defined earlier. Semantically, gha-implicit means the implicit form of the consonant gha. Thus, whenever the code implicit appears in the description of some consonant xxx, at that point, the program inserts the definition of the xxx-implicit letter. (Naturally, the description of xxx-implicit cannot have the code implicit in its description.)

Most consonants are similar in the manner in which the vowel forms are constructed, using the implicit form of the consonant, and a few other character forms. For example, kha-aa, gha-aa, da-aa, dda-aa, are all constructed by using this description: PCC implicit 0 0 ; PCC 130 -70 0 ; So, instead of restating this description for all these consonants, the CCS keyword can be used instead. The CCS keyword assigns equivalences,

CCS xxx yyy ;

states that a given consonant xxx is similar to a already defined consonant yyy, and if some vowel form (x) is missing, i.e. xxx-x description is missing, then it looks up the description for yyy-x, and uses that. This chaining can be made as deep as necessary:

CCS bbb aaa ;

CCS ccc bbb ;

CCC ddd ccc ;

etc

CCS also works similarly for ligature forms:

CCS ga-ra tmplC ;

which states that the ga-ra ligature should use the form of the tmplC dummy consonant (dummy consonants are explained further in this section). Note that it is usually dangerous to specify ligature equivalence to one of the constituent consonants, since most of the consonants do not use the codename "implicit" in all their form definitions. Thus, be careful of such definitions:

CCS ga-ra ga ;

CCS ga-ra ra ;

This will create problems if the half form of the ga-ra ligature is required since both ga and ra (and every other consonant too) have a half-form definition that does not include the code "implicit", hence instead of the ga-ra-half form, what will print out will be just the ga-half form or the ra-half form, depending on which CCS line (from the above two lines) is present in the IFM file. One way around this problem is to specifically define the half-form of the ga-ra ligature, so as to stop the program from looking for it through the CCS chain. It is usually only the half form that causes problems (as of october 1991), since all the other forms do contain the "implicit" code in their definitions.

A special consonant form, ``*'' is available for use with the CCS keyword for ligatures: It acts as a meta-character implying all the consonants.

CCS *-ra tmplC ;

CCS ga-* tmplC ;

Instead of providing consonant equivalences between consonants, additional dummy consonants may be created, these exist only in the IFM file, and used only for equivalencing a real consonant. The CCADD line defines the creation of a placeholder consonant.

CCADD tmplA ;

This states that the IFM will make use of a dummy consonant called tmplA, and then all its vowel forms (including the half form) can be defined. See the file dvnc.ifm for complete example. All that tmplA is used for is to define equivalences, again from the dvnc.ifm file, you will see lines like:

CCS chha tmplA ;

CCS tta tmplA ;

which state that if some required vowel form of chha or tta is missing, then try to use the definition of the same vowel form in tmplA.

Each IFM file can make up to 10 such dummy consonants.